Experiments for an approach to language identification with conversational telephone speech
نویسندگان
چکیده
This paper presents our recent work on language identi-cation research using conversational speech (the LDC Conversational Telephone Speech Database). The base-line system used in this study was developed recently ((4, 5]). It is based on language-dependent phone recognition and phonotactic constraints. The system was trained using monologue data and obtained an error rate of around 9% on a commonly used nine-language monologue test set. While the system was used to process conversational speech from the same nine-language task, dramatic performance degradation (with an error rate of 40%) was observed. Based on our analysis of conversational speech, two methods: (1) pre-processing and, (2) post-processing, were proposed. Without the presence of training data from conversational speech database, the nal system (the baseline system enhanced by the two proposed methods) obtained an error rate of 24%, a substantial improvement (with 41% error reduction) compared with the baseline system.
منابع مشابه
Ethnomethodology and Conversational Analysis
In a speech community, people utilize their communicative competence which they have acquired from their society as part of their distinctive sociolinguistic identity. They negotiate and share meanings, because they have commonsense knowledge about the world, and have universal practical reasoning. Their commonsense knowledge is embodied in their language. Thus, not only does social life depend...
متن کاملCombining Lattice-Based Language Dependent and Independent Approaches for Out-of-Language Detection in LVCSR
In this paper, Out-Of-Language (OOL) detection problem is handled by both language dependent (LD) and language independent (LI) approaches. In the LD approach, a novel speech content and language joint recognition algorithm is proposed, which integrates a phone lattice-based vector space modeling language recognition (LRE) backend into the conventional speech decoding procedure. In the LI appro...
متن کاملRecognizing Call-center Speech Using Models Trained from Other Domains
In this paper, we introduce a new conversational speech task – recognizing call-center speech – using data collected from Dragon’s own technical support line. We compare performance of models trained from conversational telephone speech (the Switchboard corpus) and models trained from predominantly read, microphone speech, and report on a series of experiments focusing on adapting the microphon...
متن کاملSpoken language recognition in conversational telephone speech and TV broadcast news (GLOSA)
In this brief communication we present the project GLOSA, financed by the Government of the Basque Country for the period 2010-2011. The project has two main technological objectives: (1) creating a suitable infrastructure for the development and evaluation of language recognition technologies; and (2) preparing a competitive language recognition system for conversational telephone speech, whic...
متن کاملTechniques for rapid and robust topic identification of conversational telephone speech
In this paper, we investigate the impact of automatic speech recognition (ASR) errors on the accuracy of topic identification in conversational telephone speech. We present a modified TF-IDF feature weighting calculation that provides significant robustness under various recognition error conditions. For our experiments we take conversations from the Fisher corpus to produce 1-best and lattice ...
متن کامل